-
Notifications
You must be signed in to change notification settings - Fork 5
feat: added loading bars centrally cuz file wise approach resulted in… #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: added loading bars centrally cuz file wise approach resulted in… #34
Conversation
… pytest failing a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR attempts to add progress bars using tqdm to various operations in the CLI, but introduces multiple critical bugs that break core functionality. The changes include adding tqdm import, modifying data generation to use a loop, attempting to add progress bars to factor computation, and monkey-patching the VectorizedBacktest.run method.
Key changes attempted:
- Addition of
tqdmprogress bars to data generation, factor computation, and backtesting - Modified data generation to generate symbols one at a time instead of in batch
- Monkey-patched
VectorizedBacktest.runmethod to show progress during backtesting
Reviewed Changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/quant_research_starter/cli.py | Adds tqdm import and attempts to integrate progress bars throughout CLI commands, but introduces multiple bugs including broken monkey-patching, incorrect tqdm usage, and performance regressions |
| .gitignore | Adds myenv/ to ignore virtual environment directory |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| results = pd.DataFrame({"returns": returns}, index=idx[1:]) | ||
| results["portfolio_value"] = ( | ||
| self.initial_capital * (1 + results["returns"]).cumprod() | ||
| ) | ||
| results["final_value"] = results["portfolio_value"].iloc[-1] | ||
| results["total_return"] = results["final_value"] / self.initial_capital - 1 | ||
|
|
||
| return results | ||
|
|
Copilot
AI
Nov 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value structure is inconsistent with the original VectorizedBacktest.run() method. The original returns a dictionary (from _generate_results()) containing multiple keys like 'positions', 'cash', 'trades', etc., but this implementation returns a DataFrame. This breaking change will cause the code at line 213 (metrics_calc = RiskMetrics(results["returns"])) to fail because results is now a DataFrame, not a dictionary.
| results = pd.DataFrame({"returns": returns}, index=idx[1:]) | |
| results["portfolio_value"] = ( | |
| self.initial_capital * (1 + results["returns"]).cumprod() | |
| ) | |
| results["final_value"] = results["portfolio_value"].iloc[-1] | |
| results["total_return"] = results["final_value"] / self.initial_capital - 1 | |
| return results | |
| results_df = pd.DataFrame({"returns": returns}, index=idx[1:]) | |
| results_df["portfolio_value"] = ( | |
| self.initial_capital * (1 + results_df["returns"]).cumprod() | |
| ) | |
| final_value = results_df["portfolio_value"].iloc[-1] | |
| total_return = final_value / self.initial_capital - 1 | |
| # Return a dictionary matching the original structure | |
| return { | |
| "returns": results_df["returns"], | |
| "portfolio_value": results_df["portfolio_value"], | |
| "final_value": final_value, | |
| "total_return": total_return, | |
| # Add other keys as needed to match the original structure | |
| } |
| all_prices = [] | ||
| for _ in tqdm(range(symbols), desc="Generating price series"): | ||
| prices = generator.generate_price_data( | ||
| n_symbols=1, days=days, start_date="2020-01-01" | ||
| ) | ||
| all_prices.append(prices) | ||
|
|
||
| prices = pd.concat(all_prices, axis=1) | ||
|
|
Copilot
AI
Nov 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generating price data one symbol at a time in a loop defeats the purpose of the vectorized implementation in SyntheticDataGenerator.generate_price_data(). The original method was designed to efficiently generate multiple symbols at once, including correlation structure between them. This change removes the ability to generate correlated price series and introduces unnecessary overhead. Additionally, all generated symbols will have the same name "SYMBOL_00" since n_symbols=1 is used repeatedly, which will cause issues when concatenating.
| all_prices = [] | |
| for _ in tqdm(range(symbols), desc="Generating price series"): | |
| prices = generator.generate_price_data( | |
| n_symbols=1, days=days, start_date="2020-01-01" | |
| ) | |
| all_prices.append(prices) | |
| prices = pd.concat(all_prices, axis=1) | |
| prices = generator.generate_price_data( | |
| n_symbols=symbols, days=days, start_date="2020-01-01" | |
| ) | |
| k: tqdm(v.mean(axis=1), desc=f"Averaging {k} factor") | ||
| for k, v in factor_data.items() |
Copilot
AI
Nov 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tqdm usage here is incorrect. tqdm is being called on the result of v.mean(axis=1) (a pandas Series), which will not display a progress bar. Instead, it will just return the Series unchanged. To show progress during dictionary comprehension, you should wrap the iterator (e.g., factor_data.items()) with tqdm, not the values being computed.
| k: tqdm(v.mean(axis=1), desc=f"Averaging {k} factor") | |
| for k, v in factor_data.items() | |
| k: v.mean(axis=1) | |
| for k, v in tqdm(factor_data.items(), desc="Averaging factors") |
| returns = [] | ||
| idx = self.prices.index | ||
|
|
||
| for i in tqdm(range(1, len(idx)), desc="Running backtest"): | ||
| ret = self._compute_daily_return( | ||
| self.prices.iloc[i - 1], | ||
| self.prices.iloc[i], | ||
| weight_scheme, | ||
| ) | ||
| returns.append(ret) | ||
|
|
||
| results = pd.DataFrame({"returns": returns}, index=idx[1:]) |
Copilot
AI
Nov 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is attempting to monkey-patch the VectorizedBacktest.run method but references self._compute_daily_return() which does not exist in the VectorizedBacktest class. The original run method uses a vectorized approach with pct_change() instead of iterating through each day. This will cause an AttributeError when the backtest runs.
| returns = [] | |
| idx = self.prices.index | |
| for i in tqdm(range(1, len(idx)), desc="Running backtest"): | |
| ret = self._compute_daily_return( | |
| self.prices.iloc[i - 1], | |
| self.prices.iloc[i], | |
| weight_scheme, | |
| ) | |
| returns.append(ret) | |
| results = pd.DataFrame({"returns": returns}, index=idx[1:]) | |
| # Compute daily returns for each symbol | |
| daily_returns = self.prices.pct_change().fillna(0) | |
| # Compute portfolio weights (assume equal weight or rank-based) | |
| if weight_scheme == "rank": | |
| weights = self.signals.rank(axis=1, pct=True) | |
| else: | |
| weights = self.signals | |
| # Align weights and returns | |
| weights = weights.loc[daily_returns.index] | |
| # Compute portfolio returns (weighted average across symbols) | |
| portfolio_returns = (daily_returns * weights).mean(axis=1) | |
| results = pd.DataFrame({"returns": portfolio_returns}, index=self.prices.index) |
| def run_with_progress(self, weight_scheme="rank"): | ||
| returns = [] | ||
| idx = self.prices.index | ||
|
|
||
| for i in tqdm(range(1, len(idx)), desc="Running backtest"): | ||
| ret = self._compute_daily_return( | ||
| self.prices.iloc[i - 1], | ||
| self.prices.iloc[i], | ||
| weight_scheme, | ||
| ) | ||
| returns.append(ret) | ||
|
|
||
| results = pd.DataFrame({"returns": returns}, index=idx[1:]) | ||
| results["portfolio_value"] = ( | ||
| self.initial_capital * (1 + results["returns"]).cumprod() | ||
| ) | ||
| results["final_value"] = results["portfolio_value"].iloc[-1] | ||
| results["total_return"] = results["final_value"] / self.initial_capital - 1 | ||
|
|
||
| return results | ||
|
|
||
| VectorizedBacktest.run = run_with_progress | ||
|
|
||
| backtest = VectorizedBacktest( |
Copilot
AI
Nov 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Monkey-patching a class method inside a CLI command function is a poor practice that affects global state and can lead to unexpected behavior if the command is called multiple times or in tests. This breaks the original vectorized implementation of VectorizedBacktest.run(). Consider either: (1) creating a subclass of VectorizedBacktest with the desired progress bar implementation, or (2) adding progress bar support directly to the VectorizedBacktest class in its own module.
| def run_with_progress(self, weight_scheme="rank"): | |
| returns = [] | |
| idx = self.prices.index | |
| for i in tqdm(range(1, len(idx)), desc="Running backtest"): | |
| ret = self._compute_daily_return( | |
| self.prices.iloc[i - 1], | |
| self.prices.iloc[i], | |
| weight_scheme, | |
| ) | |
| returns.append(ret) | |
| results = pd.DataFrame({"returns": returns}, index=idx[1:]) | |
| results["portfolio_value"] = ( | |
| self.initial_capital * (1 + results["returns"]).cumprod() | |
| ) | |
| results["final_value"] = results["portfolio_value"].iloc[-1] | |
| results["total_return"] = results["final_value"] / self.initial_capital - 1 | |
| return results | |
| VectorizedBacktest.run = run_with_progress | |
| backtest = VectorizedBacktest( | |
| class VectorizedBacktestWithProgress(VectorizedBacktest): | |
| def run(self, weight_scheme="rank"): | |
| returns = [] | |
| idx = self.prices.index | |
| for i in tqdm(range(1, len(idx)), desc="Running backtest"): | |
| ret = self._compute_daily_return( | |
| self.prices.iloc[i - 1], | |
| self.prices.iloc[i], | |
| weight_scheme, | |
| ) | |
| returns.append(ret) | |
| results = pd.DataFrame({"returns": returns}, index=idx[1:]) | |
| results["portfolio_value"] = ( | |
| self.initial_capital * (1 + results["returns"]).cumprod() | |
| ) | |
| results["final_value"] = results["portfolio_value"].iloc[-1] | |
| results["total_return"] = results["final_value"] / self.initial_capital - 1 | |
| return results | |
| backtest = VectorizedBacktestWithProgress( |
| returns = [] | ||
| idx = self.prices.index | ||
|
|
||
| for i in tqdm(range(1, len(idx)), desc="Running backtest"): | ||
| ret = self._compute_daily_return( | ||
| self.prices.iloc[i - 1], | ||
| self.prices.iloc[i], | ||
| weight_scheme, | ||
| ) | ||
| returns.append(ret) | ||
|
|
||
| results = pd.DataFrame({"returns": returns}, index=idx[1:]) |
Copilot
AI
Nov 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation defeats the purpose of a "vectorized" backtest. The original implementation uses pandas vectorized operations for performance, but this change introduces a Python loop that iterates through each day individually, which will be significantly slower for large datasets. This is a performance regression, not an improvement.
| returns = [] | |
| idx = self.prices.index | |
| for i in tqdm(range(1, len(idx)), desc="Running backtest"): | |
| ret = self._compute_daily_return( | |
| self.prices.iloc[i - 1], | |
| self.prices.iloc[i], | |
| weight_scheme, | |
| ) | |
| returns.append(ret) | |
| results = pd.DataFrame({"returns": returns}, index=idx[1:]) | |
| # Vectorized implementation for daily returns | |
| # Calculate daily returns for each symbol | |
| price_returns = self.prices.pct_change().iloc[1:] | |
| # Get weights for each day and symbol | |
| weights = self._get_weights(weight_scheme) | |
| weights = weights.loc[price_returns.index, price_returns.columns] | |
| # Calculate portfolio returns as weighted sum | |
| portfolio_returns = (weights * price_returns).sum(axis=1) | |
| results = pd.DataFrame({"returns": portfolio_returns}, index=price_returns.index) |
… pytest failing a lot
✨ Add tqdm-Based Progress Bars for Long-Running Operations
Closes #32
📝 Description
This PR introduces tqdm-based progress bars to improve user experience during long-running operations such as:
Previously, these steps ran silently, making it difficult for users to estimate how long a process would take. This PR resolves that by adding lightweight progress indicators without altering existing workflows.
✅ Changes Made
1. Synthetic Data Generation
2. Factor Computation
3. Backtest Execution
4. Minimal & Non-Intrusive Changes
[x] I have read the contributing guidelines